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Abstract 

We present the first micro-architectural side-channel at¬ 
tack which runs entirely in the browser. In contrast to 
other works in this genre, this attack does not require the 
attacker to install any software on the victim’s machine - 
to facilitate the attack, the victim needs only to browse 
to an untrusted webpage with attacker-controlled con¬ 
tent. This makes the attack model highly scalable and ex¬ 
tremely relevant and practical to today’s web, especially 
since most desktop browsers currently accessing the In¬ 
ternet are vulnerable to this attack. Our attack, which is 
an extension of the last-level cache attacks of Yarom et 
al. II 23 I . allows a remote adversary recover information 
belonging to other processes, other users and even other 
virtual machines running on the same physical host as 
the victim web browser. We describe the fundamentals 
behind our attack, evaluate its performance using a high 
bandwidth covert channel and hnally use it to construct a 
system-wide mouse/network activity logger. Defending 
against this attack is possible, but the required counter¬ 
measures can exact an impractical cost on other benign 
uses of the web browser and of the computer. 


1 Introduction 

Side channel analysis is a remarkably powerful class of 
cryptanalytic attack. It lets attackers extract secret infor¬ 
mation hidden inside a secure device by analyzing the 
physical signals (power, radiation, heat, etc.) the de¬ 
vice emits as it performs a secure computation IB]. Al¬ 
legedly used by the intelligence community as early as 
World War II, and hrst discussed in an academic context 
by Kocher et al. in 1996 lfT4ll . side channel analysis has 
been shown to be effective in breaking into myriad real- 
world systems, from car immobilizers to high-security 
cryptographic coprocessors ||8][T8]. A particular kind of 
side-channel attack which is relevant to personal com¬ 
puters is the cache attack, which exploits the use of cache 


memory as a shared resource between different processes 
or users to disclose secret information iniini. 

While the potency of side-channel attacks is estab¬ 
lished without question, their application to practical sys¬ 
tems is relatively limited. The main limiting factor to 
the practicality of side-channel attacks is the problem¬ 
atic attack model they assume: with the exception of 
network-based timing attacks, most side-channel attacks 
require that the attacker be in close proximity to the vic¬ 
tim. Cache attacks, in particular, typically assume that 
the attacker is capable of executing arbitrary binary code 
on the victim’s machine. While this assumption holds 
for Infrastructure/Platform-as-a-Service (laaS/PaaS) en¬ 
vironments such as Amazon’s cloud computing platform, 
it is less relevant for other settings. 

In this report we challenge this limiting security as¬ 
sumption by presenting a successful cache attack which 
assumes a far more relaxed and practical attacker model. 
In our attacker model, the victim merely has to access a 
website owned by the attacker. Despite this minimal at¬ 
tack model, we show how the attacker can still launch an 
attack in a practical time frame and extract meaningful 
information from the system under attack. Keeping in 
tune with this computing setting, we chose to focus our 
attacks not on cryptographic key recovery but rather on 
tracking user behavior. The attacks described in this 
report are therefore highly practical: practical in the as¬ 
sumptions and limitations they cast upon the attacker; 
practical in the time they take to run; and practical in 
terms of the beneht they deliver to the attacker. To the 
best of our knowledge, this is the hrst side-channel at¬ 
tack which can scale effortlessly into millions of targets. 

For our attacks we assume that the victim is using a 
personal computer powered by a late-model Intel CPU. 
We furthermore assume that the user is accessing the web 
through a browser with comprehensive HTML5 support. 
As we show in Subsection 5.1 this covers a vast majority 
of personal computers connected to the Internet. The vic¬ 
tim is coerced to view a webpage containing an attacker- 
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controlled element such as an advertisement. The attack 
code itself, which we describe in more detail in Section]^ 
executes a Javascript-based cache attack, which allows 
it to track accesses to the DUT’s last-level cache (LLC) 
over time. Since this single cache is shared by all CPU 
cores and by all users, processes and protection rings, 
this information can provide the attacker with a detailed 
knowledge of the user and the system under attack. 

1.1 The Memory Architecture of Modern 
Intel CPUs 

Modern computer systems typically incorporate a high¬ 
speed central processing unit (CPU) and a large amount 
of lower-speed random access memory (RAM). To 
bridge the performance gap between these two com¬ 
ponents, modern computer systems make use of cache 
memory - a type of memory element with a smaller 
size but a higher performance, which contains a sub¬ 
set of the RAM which has been recently accessed by 
the CPU. The cache memory is typically arranged in a 
cache hierarchy, with a series of progressively larger 
and slower memory elements being placed in levels be¬ 
tween the CPU and the RAM. Figure taken from 
E2\ . shows the cache hierarchy used by Intel Ivy Bridge 
series CPUs, incorporating a small, fast level 1 (LI) 
cache, a slightly larger level 2 (L2) cache, and finally a 
larger level 3 (L3) cache which is then connected to the 
RAM. The current generation of Intel CPUs, code named 
Haswell, extends this hierarchy by another level of em¬ 
bedded DRAM (eDRAM), which is not discussed here. 
Whenever the CPU wishes to access a memory element, 
the memory element is first searched for in the cache hi¬ 
erarchy, saving the lengthy round-trip to the RAM. If the 
CPU requires an element which is not currently in the 
cache, an event known as a cache miss, one of the ele¬ 
ments currently residing in the cache must be evicted to 
make room for this new element. 

The Intel cache micro-architecture is inclusive - all el¬ 
ements in the LI cache must also exist in the L2 and L3 
caches. Conversely, if a memory element is evicted from 
the L3 cache, it is also immediately evicted from the L2 
and LI cache. It should be noted that the AMD cache 
micro-architecture is exclusive, and thus the attacks de¬ 
scribed in this report are not immediately applicable to 
that platform. 

This report focusses on the level 3 cache, commonly 
refetTed to as the last-level cache (LLC). Due to the 
LLC’s relatively large size, it is not efficient to search 
its entire contents whenever the CPU accesses the mem¬ 
ory. Instead, the LLC is divided into cache sets, each 
covering a fixed subset of the memory space. Each of 
these cache sets contains several cache lines. For exam¬ 
ple, the Intel Core i7-3720QM processor, which belongs 


to the Haswell family, includes 8192 = 2'^ cache sets, 
each of which can hold 12 lines of 64 = 2® bytes each, 
giving a total cache size of 8192xl2x64=6MB. When 
the CPU needs to check whether a given physical ad¬ 
dress is present in the L3 cache, it calculates which cache 
set is responsible for this address, then only checks the 
cache lines corresponding to this set. As a consequence, 
a cache miss event for a physical address can result in 
the eviction of only one of the relatively small amount of 
lines sharing its cache set, a fact of which we make great 
use in our attack. The method of mapping between 64-bit 
physical addresses and 13-bit cache set indices has been 
reverse engineered by Hund et al. in 2013 nil : of the 64 
physical address bits, bits 5 to 0 are ignored, bits 16 to 
6 are taken directly as the lower 11 bits of the set index, 
and bits 63 to 17 are hashed to form the upper 2 bits of 
the cache index. The LLC is shared between all cores, 
threads, processes, users, and even virtual machines run¬ 
ning on a certain CPU chip, regardless of privilege rings 
or other protection similar mechanisms. 

Modern personal computers use a virtual memory 
mechanism, in which user processes do not typically 
have direct knowledge or access to the system’s physi¬ 
cal memory. Instead, these processes are allocated vir¬ 
tual memory pages. When a virtual memory page is 
accessed by a currently executing process, the operat¬ 
ing system dynamically associates the page with a page 
frame in physical memory. The CPU’s memory manage¬ 
ment unit (MMU) is in charge of mapping between the 
virtual memory accesses made by different processes and 
accesses to physical memory. The size of pages and page 
frames in most Intel processors is typically set to 4KB, 
and both pages and page frames are page aligned - the 
starting address of each page is a multiple of the page 
size. This means that the lower 12 bits of any virtual ad¬ 
dress and its corresponding virtual address are generally 
identical, another fact we use in our attack. 

1.2 Cache Attacks 

The cache attack is the most well-known representative 
of the general class of micro-architectural attacks, which 
are defined by Aciii7,oemez in his excellent survey m as 
attacks which “exploit deeper processor ingredients be¬ 
low the trust architecture boundary” to recover secrets 
from various secure systems. Cache attacks make use of 
the fact that, regardless of higher-level security mech¬ 
anisms such as sandboxing, virtual memory, privilege 
rings, hypervisors etc., both secure and insecure pro¬ 
cesses can interact through their shared use of the cache. 
This allows an attacker to craft a “spy” process which 
can measure and make inferences about the internal state 
of a secure process through their shared use of the cache. 
First identified by Hu in 1992 HI] , several results have 
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Figure 1; The Intel Ivy Bridge Cache Architecture (taken from EU l 


shown how the cache side-channel can be used to recover 
AES keys ifTTl CT. RSA keys ifT^ . and even allow one 
virtual machine to compromise another virtual machine 
running on the same host ll20ll . 

Our attack is modeled after the Primeh-Probe attack 
method, first described by Osvik et al. in El in the con¬ 
text of the LI cache. The attack was later extended by 
Yarom et al. in to last-level caches on systems with 
large pages enabled, and we extend it in this work to 
last-level caches in the more common case of 4K-sized 
pages. In general, the Primeh-Probe attack follows a 
four-step pattern. In the first step, the attacker creates 
one or more eviction sets. An eviction set is a set of lo¬ 
cations in memory which, when accessed, can take over 
a single cache line which is also used by the victim pro¬ 
cess. In the second step, the attacker primes the cache set 
by accessing the eviction set. This forces the eviction of 
the victim’s code or instructions from the cache set and 
brings it to a known state. In the third step, the attacker 
triggers or simply waits for the victim to execute and po¬ 
tentially utilize the cache. Finally, the attacker probes 
the cache set by accessing the eviction set yet again. A 
low access latency suggests that the attacker’s code or 
data is still in the cache, while a higher access latency 
suggests that the victim’s code made use of the cache 
set, thereby teaching the attacker about the victim’s inter¬ 
nal state. The actual timing measurement is carried out 
by using the unprivileged assembler instruction rdtsc, 
which provides a very sensitive measurement of the pro¬ 
cessor’s cycle count. Iterating over the linked list also 
serves a secondary purpose by forcing the cache set yet 
again into an attacker-controlled state, thus preparing for 
the next round of measurements. 


client side of the modern web. Javascript code is deliv¬ 
ered to the browser runtime in source-code form and is 
compiled and optimized by the browser using a just-in- 
time mechanism. The fierce competition between differ¬ 
ent browser vendors resulted in an intense focus on im¬ 
proving Javascript performance. As a result. Javascript 
code performs in some scenarios on a level which is on 
par with that of native code. 

The core functionality of the Javascript language is 
defined by the ECMA industry association in Standard 
ECMA-262 jT]. The language standard is complemented 
by a large set of application programming interfaces 
(APIs) defined by the World Wide Web Consortium ||6l, 
which make the language practical for developing web 
content. The Javascript API set is constantly evolving, 
and browser vendors add support to new APIs over time 
according to their own development schedules. Two spe¬ 
cific APIs which are of use to us in this work are the 
Typed Array Specification a, which allows efficient ac¬ 
cess to unstructured binary data, and the High Resolu¬ 
tion Time API fTh), which provides sub-millisecond time 
measurements to Javascript programs. As we show in 
Subsection im a large majority of Web browsers in use 
today support both of these APIs. 

Javascript code runs in a highly sandboxed environ¬ 
ment - code delivered via Javascript has highly restricted 
access to the system. Eor example. Javascript code can¬ 
not open files, even for reading, without the permission 
of the user. Javascript code cannot execute native lan¬ 
guage code or load native code libraries. Most signifi¬ 
cantly, Javascript code has no notion of pointers. Thus, 
it is impossible to determine even the virtual address of a 
Javascript variable. 


1.3 The Web Runtime Environment 

Javascript is a dynamically typed, object-based script¬ 
ing language with runtime evaluation, which powers the 


3 


1.4 Our Contribution 

Our objective was to craft a last-level cache attack which 
can be deployed over the web. This process is quite 























































































challenging since Javascript code cannot load shared li¬ 
braries or execute native language programs, and since 
Javascript code is forced to make timing measurements 
using scripting language function calls instead of using 
dedicated assembler instruction calls. These challenges 
notwithstanding, we have been able to successfully ex¬ 
tend cache attacks to the web-based environment: 

• We present a novel method of creating a non- 
canonical eviction set for the last-level cache. In 
contrast to our method does not require the 
system to be conhgured for large page support, and 
as such can immediately be applied to a wider va¬ 
riety of desktop and server systems. We show that 
our method runs in a practical time even when im¬ 
plemented in Javascript. 

• We present a fully functional last-level cache at¬ 
tack using unprivileged Javascript code. We eval¬ 
uate its performance using a covert channel method, 
both between different processes running on the 
same machine and between a VM client and its host. 
The nominal capacity of the Javascript-based chan¬ 
nel is on the order of hundreds of kilobits per sec¬ 
ond, comparable to that of the native code approach 

of ms. 

• We show how cache-based methods can be used to 
effectively track the behavior of the user. This ap¬ 
plication of cache attacks is more relevant to our at¬ 
tack model than the cryptanalytic applications often 
explored in other works. 

• Finally, we describe possible countermeasures to 

our attack and discuss their systemwide cost. 

Document Structure: In Section]^ we presents the de¬ 
sign and implementation of the different steps of our at¬ 
tack methodology. In Section|^we present a covert chan¬ 
nel constructed using our attack methodology and evalu¬ 
ate its performance. In Section]^ we investigate the use 
of cache-based attacks for tracking user behavior both 
inside and outside the browser. Finally, Section con¬ 
cludes the paper with a discussion of countermeasures 
and open research challenges. 

2 Attack Methodology 

As described in the previous section, the four steps in¬ 
volved in a successful Primeh-Probe attack are: creat¬ 
ing an eviction set for one or more relevant cache sets, 
priming the cache set, triggering the victim operation 
and hnally probing the cache set again. While the actual 
priming and probing are pretty straightforward to imple¬ 
ment, hnding cache sets which correlate to interesting 
system behaviors and creating eviction sets for them is 


less trivial. In this Section we describe how each of these 
steps was implemented in Javascript. 

2.1 Creating an Eviction Set 

2.1.1 Design 

As stated in ll23l . the hrst step of a Primeh-Probe attack 
is to create an eviction set for a certain desired cache set 
shared with a victim process. This eviction set consists 
of a set of variables which are all mapped by the CPU 
into the same cache set. The use of a linked list is meant 
to defeat the CPU’s memory prefetching and pipelining 
optimizations, as suggested by ll20l . We hrst show how 
we create an eviction set for an arbitrary cache set, and 
later address the problem of hnding which cache set is 
shared with the victim. 

As discussed in lEl, the LI cache determines the set 
assignment for a variable based the lower bits of its vir¬ 
tual address. Since the attacker is assumed to know the 
virtual addresses of its own variables, it was thus straight¬ 
forward to create an eviction set in the LI attack model. 
In contrast, set assignments for variables in the LLC are 
made by reference to their physical memory address, 
which are not generally available to an unprivileged pro¬ 
cess. The authors of ll23l partially circumvented this 
problem by assuming that the system is operating in large 
page mode, in which the lower 21 bits of the physical and 
virtual addresses are identical, and by the additional use 
of an iterative algorithm to resolve the unknown upper 
(slice) bits of the cache set index. 

In the attack model we consider, the system is running 
in the traditional 4K page mode, where only the lower 12 
bits of the physical and virtual addresses are identical. To 
our further difficulty. Javascript has no notion of pointers, 
so even the virtual addresses of our own variables are 
unknown to us. 

The mapping of 64-bit physical memory addresses 
into 13-bit cache set indices was investigated by Hund 
et al. in ifT^ . They discovered that accessing a contigu¬ 
ous SMB “eviction buffer” of physical memory will com¬ 
pletely invalidate all cache sets in the L3 cache. While 
we could not allocate such an eviction buffer in user¬ 
mode (indeed, the work of was assisted by a kernel¬ 
mode driver), we allocated an SMB byte array in vir¬ 
tual memory using Javascript (which was assigned by the 
operating system into an arbitrary and non-contiguous 
set of 4K physical memory pages), and measured the 
system-wide effects of iterating over this buffer. We dis¬ 
covered that access latencies to unrelated variables in 
memory were slowed down by a noticeable amount when 
we accessed them immediately after iterating through 
this eviction buffer. We also discovered that the slow¬ 
down effect persisted even if we did not access the entire 
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buffer, but rather accessed it in offsets of once per ev¬ 
ery 64 bytes. However, it was not immediately clear how 
to map each of the 13IK offsets we accessed inside this 
eviction buffer into each of the 8192 possible cache sets, 
since we did not know the physical memory locations of 
the various pages of our buffer. 

A naive approach to solving this problem would be to 
fix an arbitrary “victim” address in memory, then find 
by brute force which set of 12 out of the 13IK offsets 
share a set with this address. To do so, we could fix some 
subset of the 13IK offsets, then measure whether the ac¬ 
cess latency to this victim address is increased after it¬ 
erating through these offsets. If the latency increases, 
this means the subset contains the 12 addresses sharing 
the set with the victim address. If the latency does not 
change, then the subset does not contain at least one of 
these 12 addresses, allowing the victim address to remain 
in the cache. By repeating this process 8192 times, each 
time with a different victim address, we would be able to 
identify each cache set and create our data structure. 

An immediate application of this heuristic would take 
an impractically long time to run. Fortunately, the page 
frame size of the Intel MMU, as described in Subsection 
o could be used to our great advantage. Since virtual 
memory is page aligned, the lower 12 bits of each virtual 
memory address are identical to the lower 12 bits of each 
physical memory address. According to Hund et al., 6 of 
these 12 bits are used in uniquely determining the cache 
set index. Thus, an offset in our eviction buffer cannot 
be the same cache set as all 13 IK other offsets, but rather 
only with the 8K other offsets sharing address bits 12 to 
6 . In addition, discovering a single cache set can imme¬ 
diately teach us about 63 additional cache sets located 
in the same page frame. Joined with the discovery that 
Javascript allocates large data buffers along page frame 
boundaries, this led to the greedy algorithm described in 
Algorithm [T] 

By running Algorithmic multiple times, we can grad¬ 
ually create eviction sets covering most of the cache, ex¬ 
cept for those parts which are accessed by the Javascript 
runtime itself. We note that, in contrast to the eviction 
sets created by the algorithm of 12 ^ . our eviction set is 
non-canonical - since Javascript has no notion of point¬ 
ers, we cannot identify which of the CPU’s cache sets 
corresponds to any particular eviction set we discover. 
Furthermore, running the algorithm multiple times on the 
same system will result in a different mapping each time 
it is run. This property stems from the use of traditional 
4K pages instead of large 2MB pages, and will hold even 
if the eviction sets are created using native code and not 
Javascript. 


Algorithm 1 Profiling a cache set 

Let S be the set of unmapped pages, and address x be an 

arbitrary page-aligned address in memory 

1. Repeat k times: 

(a) Iteratively access all members of S 

(b) Measure f i, the time it takes to access x 

(c) Select a random page s from S and remove it 

(d) Iteratively access all members of 5\i 

(e) Measure f 2 , the time it takes to access x 

(f) If removing page s caused the memory access 
to speed up considerably (i.e., t\—t 2 > thres), 
then this page is part of the same set as x. Place 
it back into S. 

(g) If removing page s did not cause memory ac¬ 
cess to speed up considerably, then this ad¬ 
dress is not part of the same set as x. 

2. If |5| = 12, return S. Otherwise report failure. 


2.1.2 Evaluation 

We implemented Algorithml^in Javascript and evaluated 
it on Intel machines using CPUs from the Ivy Bridge, 
Sandy Bridge and Haswell families, running the latest 
versions of Safari and Firefox on Mac OS Yosemite and 
Ubuntu 14.04 LTS, respectively. The systems were not 
configured to use large pages, but instead were running 
with the default 4K page size. The code snippet shown 
in Listingl^shows lines l.d and I.e of the algorithm, and 
demonstrate how we iterate over the linked list and mea¬ 
sure latencies using Javascript. The algorithm requires 
some additional steps to run under Chrome and under 
Internet Explorer, which we describe in Subsection |5.1| 

Figure]^ shows the performance of the profiling algo¬ 
rithm, as evaluated on an Intel i7-3720QM running Fire- 
fox 35.0.1 for Mac OS 10.10.2. We were pleased to find 
that the algorithm was able to map more than 25% of the 
cache in under 30 seconds of operation, and more than 
50% of the cache after 1 minute. The algorithm seems 
very simple to parallelize, since most of the execution 
time is spent on data structure maintenance and only a 
minority of it is actually spent in the actual invalidate- 
and-measure portion. The entire algorithm fits into less 
than 500 lines of Javascript code. 

To verify that our algorithm was indeed capable of 
identifying cache sets, we designed an experiment that 
compares the access latencies for a flushed and an un¬ 
flushed variable. Figure shows two probability distri¬ 
bution functions comparing of the time required to access 


5 





// Invalidate the cache set 
var currentEntry = startAddress; 
do ■[ 

currentEntry = 

probeView.getUint32(currentEntr 
}• while (currentEntry != startAddre 

// Measure access time 
var startTime = 

window.performance.now(); 
currentEntry = 

primeView.getUint32(variableToAcc 
var endTime = window.performance.no 



gJ^gure 3: Probability distribution of access times for 
(flushed vs. un-flushed variable (Haswell CPU) 


Listing 1; Javascript code to invalidate a cache set, then 
measure access time 



Figure 2: Cumulative performance of the profiling algo¬ 
rithm 



Access Latency (ns) 


Figure 4; Probability distribution of access times for 
flushed vs. un-flushed variable (Sandy Bridge CPU) 

a variable which has recently been flushed from the cache 
using our method (gray line) with the time required to 
access a variable which currently resides in the cache set 
(black line). The timing measurements were carried out 
using Javascript’s high resolution timer, and thus include 
the additional delay imposed by the Javascript runtime. 
It is clear to see that the two distributions are distinguish¬ 
able, confirming the correct operation of our profiling 
method. Figure shows a similar plot captured on an 
older-generation Sandy Bridge CPU, which includes 16 
entries per cache set. 

By selecting a group of cache sets and repeatedly mea¬ 
suring their access latencies over time, the attacker is 
provided with a very detailed picture of the real-time ac¬ 
tivity of the cache. We call the visual representation of 
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this image a “memorygram”, since it is looks quite simi¬ 
lar to an audio spectrogram. 

A sample memorygram, collected over an idle period 
of 400ms, is presented in Figure The X axis corre¬ 
sponds to time, while the Y axis corresponds to different 
cache sets. The sample shown has a temporal resolution 
of 250ps and monitors total of 128 cache sets. The in¬ 
tensity of each pixel corresponds to the access latency 
of this particular cache set at this particular time, with 
black representing a low latency, indicating no other pro¬ 
cess accessed this cache set between the previous mea¬ 
surement and this one, and white representing a higher 
latency, suggesting that the attacker’s data was evicted 
from the cache between this measurement and the previ¬ 
ous one. 

Observing this memorygram can provide several im¬ 
mediate insights. First, it is clear to see that despite 
the use of Javascript timers instead of machine language 
instructions, measurement jitter is quite low active and 
inactive sets are clearly differentiated. It is also easy 
to notice several vertical line segments in the memo¬ 
rygram, indicating multiple adjacent cache sets which 
were all active during the same time period. Since con¬ 
secutive cache sets (within the same page frame) corre¬ 
spond to consecutive addresses in physical memory, we 
believe this signal indicates the execution of a function 
call which spans more than 64 bytes of assembler in¬ 
structions. Several smaller groups of cache sets are also 
accessed together. We theorize that the these smaller 
groups correspond to variable accesses. Finally, the 
white horizontal line indicates a variable which is con¬ 
stantly accessed during our measurements. This variable 
probably belongs to the measurement code or to the un¬ 
derlying Javascript runtime. It is remarkable that such a 
wealth of information about the system is available to an 
unprivileged webpage! 

2.2 Identifying Interesting Regions in the 
Cache 

The eviction set allows the attacker to monitor the ac¬ 
tivity of arbitrary sets of the cache. Since the eviction 
set we receive is non-canonical, the attacker must now 
correlate the cache sets he has profiled to data or code 
locations belonging to the victim. This learning/classi¬ 
fication problem was addressed earlier by Zhang et al. 
in ll^ and by Yarom et al. in ll23l . where various ma¬ 
chine learning methods such as SVM were used to derive 
meaning from the output of cache latency measurements. 

To effectively carry out the learning step, the attacker 
needs to induce the victim to perform an action, then ex¬ 
amine which cache sets were touched by this action, as 
formally defined in Algorithm]^ 

Finding a function for step (c) of the algorithm was 


Algorithm 2 Interesting Regions in the Cache 
Let Si be the data structure matched to eviction set i 

1. For each set i: 

(a) Iteratively access all members of 5,- to prime 
the cache set 

(b) Measure the time it takes to iteratively access 
all members of 5, 

(c) Perform an interesting operation 

(d) Measure once more the time it takes to itera¬ 
tively access all members of Si 

(e) If performing the interesting operation caused 
the access time to slow down considerably, 
then the operation was associated with cache 
set i. 


actually quite challenging due to the limited permissions 
granted to Javascript code. This can be contrasted with 
the ability of Apecechea et al. to trigger a minimal ker¬ 
nel operation by invoking an empty sysenter call JS). 
To carry out this step, we had to survey the Javascript 
runtime to discover function calls which may trigger in¬ 
teresting behavior, such as file access, network access, 
memory allocation, etc. We were also interested in func¬ 
tions which take a relatively short time to run and left 
no background “tails” such as garbage collection which 
would impact our measurement in step (d). Several such 
functions were discovered in a different context by Ho et 
al. in ifTOll . Another approach would be to induce the user 
to perform an interesting behavior (such as pressing a key 
on his keyboard) on the behalf of the attacker. The learn¬ 
ing process in this case might be structured (where the 
attacker knows exactly when the victim operation was 
executed), or unstructured (where the attacker can only 
assume that relatively busy periods of system activity are 
due to victim operations. We make use of both of these 
approaches in the attack we present in Section]^ 


Since our code will always detect activity caused by 
the Javascript runtime, the high performance timer code, 
and other components of the web browser which are run¬ 
ning regardless of the call being executed, we actually 
called two similar functions and examined the differ¬ 
ence between the activity profile of the two evaluations 
to identify relevant cache sets. 
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Figure 5: Sample memory gram 



3 A Cache-Based Covert Channel in 
Javascript 

3.1 Motivation 

As shown in last-level cache access patterns can be 
used to construct a high-bandwidth covert channel and 
effectively exfiltrate sensitive information between vir¬ 
tual machines co-resident on the same physical host. In 
our particular attack model, in which the attacker is not 
in a co-resident virtual machine but rather inside a web¬ 
page, the motivation for a covert channel is different but 
still very interesting. 

By way of motivation, let us assume that a Security 
Agency is tracking the criminal mastermind Bob. Mak¬ 
ing use of a spear phishing campaign, the Agency in¬ 
stalls a piece of software of its own choosing, commonly 
referred to as an Advanced Persistent Threat (APT), on 
Bob’s personal computer. The APT is designed to log 
incriminating information about Bob and send it to the 
Agency’s secret servers. Bob is however highly security- 
savvy, and is using an operation system which enforces 
strict Information Flow Tracking ll24ll . This operating 
system feature prevents the APT from accessing the net¬ 
work after it accesses any file containing private user 
data. 

Javascript-based cache attacks can immediately be put 
to use to allow the Agency to operate in such a scenario, 
as long as Bob can be enticed to view a website con¬ 
trolled by the Security Agency. Instead of transmitting 
the private user data over the network, the APT will use 
the cache side-channel to communicate with the mali¬ 
cious website, without setting off the flow tracking capa¬ 
bilities of the operating system. 

This case study is inspired by the “RF retro-reflector” 
design attributed to a certain Security Agency, in which a 
collection device such as a microphone does not transmit 
the collected signal directly, but instead modulates the 


collected signal onto an “illuminating signal” sent to it 
by an external “collection device”. 

3.1.1 Design 

The design of our covert channel system was influenced 
by two requirements: first, we wanted the transmitter part 
to be as simple as possible, and in particular we did not 
want it to carry out the eviction set algorithm of Sub¬ 
section |2T| Second, since the receiver’s eviction set is 
non-canonical, it should be as simple as possible for the 
receiver to search for the sets onto which the transmitter 
was modulating its signal. 

To satisfy these requirements, our transmitter/APT 
simply allocates a 4K array in its own memory and con¬ 
tinuously modulates the collected data into the pattern 
of memory accesses to this array. There are 64 cache 
sets covered by this 4K array, allowing the APT to trans¬ 
mit 64 bits per time period. To make sure the memory 
accesses are easily located by the receiver, the same ac¬ 
cess pattern is repeated in several additional copies of 
the array. Thus, a considerable percentage of the cache 
is actually exercised by the transmitter, in contrast to the 
method of ll23l which assumes a canonical eviction set, 
and thus only activates two lines. 

The receiver code profiles the system’s physical mem¬ 
ory, then searches for one of the page frames containing 
the data modulated by the APT. The data can then be de¬ 
modulated from the memory access pattern and uploaded 
back to the server, all without violating the information 
flow tracking protections. 

3.1.2 Evaluation 

Our attacker model assumes that the transmitter part is 
written in (relatively fast) native language, while the re¬ 
ceiver part is implemented in Javascript. Thus, we as¬ 
sumed that the limiting factor in the performance of our 
system is the sampling speed of the malicious website. 
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Figure 6; A host-to-host covert channel 


Figure 7; A host-to-VM covert channel 


To evaluate the bandwidth of this covert channel, we 
wrote a simple program that iterates over memory in a 
predetermined pattern (in our case, a bitmap containing 
the word “Usenix”). Next, we attempted to search for 
this memory access pattern using a Javascript cache at¬ 
tack, then measured the maximum sampling frequency at 
which the Javascript code could be run. 

Figure shows a memorygram capturing an execu¬ 
tion of this covert channel. The nominal bandwidth of 
the covert channel was measured to be approximately 
320kbps, a figure which compares well with the 1.2Mbps 
bandwidth achieved by the native code cross-VM covert 
channel implemented by ll2^ . 

Figure [7] shows a similar memorygram where the re¬ 
ceiver code is not running directly on the host, but rather 
on a virtual machine (Firefox 34 running on Ubuntu 
14.01 inside VMWare Fusion 7.1.0). While the peak 
bandwidth of the in this scenario was severely degraded 
to approximately 8kbps, the fact that a webpage running 
inside a virtual machine is capable of probing the under¬ 
lying hardware is still quite surprising. 

4 User Behavior Tracking Through Cache 
Attacks 

Most works which evaluate cache attacks assume that the 
attacker and the victim share a colocated machine inside 
a cloud-provider data center. Such a machine is not typ¬ 
ically configured to accept interactive input, and accord¬ 
ingly most works in this field focus on the recovery of 
cryptographic keys or other secret state elements, such 
as random number generator states 12^ . For this work, 
we chose to examine how cache attacks can be used to 
track the interactive behavior of the user, a threat which 


is more relevant to the attack model we consider. We note 
that Il20l have already attempted to track keystroke tim¬ 
ing events using coarse-grained measurements of system 
load on the LI cache. 

This case study shows how a malicious webpage can 
track a user’s activity using a cache attack. In the at¬ 
tack presented below, we assume that the user has loaded 
a malicious webpage in a background tab or window, 
and is carrying out sensitive operations in another tab, 
or even in a completely different application with no In¬ 
ternet connectivity. 

We chose to focus on mouse and network activity be¬ 
cause the operating system code that handles them is 
non-negligible. Thus, we expected them to have a rel¬ 
atively large cache footprint. They are also easily trig¬ 
gered by content running within the restricted Javascript 
security model, as we describe below. 

4.1 Design 

The structure of both attacks is similar. First, the profil¬ 
ing phase is carried out, allowing the attacker to probe 
individual cache sets using Javascript. Next, during a 
training phase, the activity to be detected (i.e. network 
activity or mouse activity) is triggered, and the cache ac¬ 
tivity is sampled multiple times with a very high tempo¬ 
ral resolution. While the network activity was triggered 
directly by the measurement script (by executing a net¬ 
work request), we simply waved the mouse around over 
the webpage during the training period 0 

By comparing the cache activity during the idle and 
active periods of the training phase, the attacker learns 

*In a full attack, the user can be enticed to move the mouse by 
having him play a game or fill out a form. 
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which cache sets are uniquely active during the relevant 
activity and trains a classifier on these cache sets. Finally, 
during the classification phase, the attacker monitors the 
interesting cache sets over time to learn about the user’s 
activity. 

We used a basic unstructured training process, assum¬ 
ing that the most intensive operation performed by the 
system during the training phase would be the one being 
measured. To take advantage of this property, we cal¬ 
culated the Hamming weight of each measurement over 
time (equivalent to the count of cache sets which are ac¬ 
tive during a certain time period), then applied a k-means 
clustering of these Hamming weights to divide the mea¬ 
surements into several clusters. We then calculated the 
mean access latency of each cache set in every cluster, 
arriving at a centroid for each cluster. To classify an un¬ 
known measurement vector, we measured the Euclidean 
distance between this vector and each of these centroids, 
classifying it as the closest one. 

In the classification phase, we generated network traf¬ 
fic using the command-line tool wget and moved the 
mouse outside of the browser window. To provide 
ground truth for the network activity scenario, we con¬ 
currently measured the traffic on the system using tcp- 
dump, then mapped the timestamps logged by tcpdump 
to the times detected by our classifier. To provide ground 
truth for the mouse activity scenario, we wrote a web¬ 
page that timestamps and logs all mouse events, then 
moved the mouse over this webpage. We stress that the 
mouse-logging webpage was run on a different browser 
(Chrome) than the measuring code (Firefox). 

4.2 Evaluation 

The results of the activity measurement are shown in Fig- 
ures[^and|^ The top part of both figures shows the real¬ 
time activity of a subset of the cache. On the bottom part 
of each figure are the classifier outputs, together with the 
ground truth which was collected externally. As the Fig¬ 
ures show, our extremely simple classifier was quite ca¬ 
pable of detecting mouse and network activity. The per¬ 
formance of the attack can be improved without a doubt 
by using more advanced training and classification tech¬ 
niques. We stress that the mouse activity detector did not 
detect network activity, and vice versa. 

The classifier’s measurement rate was only 500Hz. As 
a result, it could not count individual packets but rather 
periods of network activity and inactivity. In contrast, 
our mouse detection code actually logged more events 
than the ground truth collection code. This is due to the 
fact that the Chrome browser throttles mouse events to 
web pages down to a rate of approximately 60Hz. 

Detecting network activity can be a stepping stone to¬ 
ward a deeper insight of the user’s activity, as famously 
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Figure 8: Network activity detection 



Figure 9: Mouse activity detection 
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demonstrated by Chen et al. in 0. In essence, while 
Chen et al. assumed a network-level attacker which can 
monitor all incoming and outgoing traffic to the victim, 
the techniques presented here can enable any malicious 
website to monitor the concurrent web activities of its 
users. The attack can be bolstered by more indicators, 
such as memory allocations (as explored by ifTSlU . DOM 
layout events, disk writes and so on. 

5 Discussion 

This work shows that side-channel attacks have a much 
wider reach than previously expected. Instead of being 
relevant only for very specific attacker scenarios, the at¬ 
tack proposed here can be mounted against most com¬ 
puters connected to the Internet. The fact that so many 
systems are suddenly vulnerable to side-channel attacks 
suggests that side-channel resistant algorithms and sys¬ 
tems should be the norm, rather than the exception. 

5.1 Prevalence of Vulnerable Systems 

Our attack requires a personal computer powered by 
an Intel CPU based on the Sandy Bridge, Ivy Bridge, 
Haswell or Broadwell micro-architectures. According 
to data from IDC, more than 80% of all PCs sold after 
2011 satisfy this requirement. We furthermore assume 
that the user is using a web browser which supports the 
HTML 5 High Resolution Time API and the Typed Ar¬ 
rays specification. Table notes the earliest version at 
which these APIs are supported for each of the common 
browser brands, as well as the proportion of global In¬ 
ternet traffic coming from vulnerable browser versions, 
according to StatCounter GlobalStats measurements as 
of January 2015 0. As the table shows, more than 80% 
of desktop browsers in use today are vulnerable to the 
attack we describe. 

The effectiveness of our attack depends on being able 
to perform precise measurements using the Javascript 
High Resolution Time API. While the W3C recommen¬ 
dation of this API ifT^ specifies that the a high-resolution 
timestamp should be “a number of milliseconds accurate 
to a thousandth of a millisecond”, the maximum reso¬ 
lution of this value is not specified, and indeed varies 
between browser versions and operating systems. In our 
testing we discovered, for instance, that the actual reso¬ 
lution of this timestamp for Safari for MacOS was on the 
order of nanoseconds, while Internet Explorer for Win¬ 
dows had a 0.8ps resolution. Chrome, on the other hand, 
offered a uniform resolution of Ip on all operating sys¬ 
tems we tested. 

Since, as shown in Figure the timing difference be¬ 
tween a single cache hit and a single cache miss is on 


the order of 50ns, the profiling and measurement algo¬ 
rithms need to be slightly modified to support systems 
with coarser-grained timing resolution. In the profiling 
stage, instead of measuring a single cache miss we repeat 
the memory access cycle multiple times to amplify the 
time difference. For the measurement stage, we cannot 
amplify a single cache miss, but we can take advantage 
of the fact that code access typically invalidates multiple 
consecutive cache sets from the same page frame. As 
long as at least 20 out of the 64 cache sets in a single 
page frame register a cache miss, our attack is successful 
even with microsecond time resolution. 

The attack we propose is also easily applied to mo¬ 
bile devices such as smartphones and tablets. It should 
be noted that the Android Browser supports High Reso¬ 
lution Time and Typed Arrays starting from version 4.4, 
but at the time of writing the most recent version of iOS 
Safari (8.1) did not support the High Resolution Time 
API. 

5.2 Countermeasures 

The attacks described in this report are possible because 
of a confluence of design and implementation decisions 
starting at the micro-architectural level and ending at the 
Javascript runtime; The method of mapping a physical 
memory address to cache set; the inclusive cache micro¬ 
architecture; Javascript’s high-speed memory access and 
high-resolution timer; and finally. Javascript’s permis¬ 
sion model. Mitigation steps can be applied at each of 
these junctions, but each will impose a drawback on the 
benign uses of the system. 

On the micro-architectural level, changes to the way 
physical memory addresses are mapped to cache lines 
will severely confound our attack, which makes great use 
the fact that 6 of the lower 12 bits of the address are 
used directly to select a cache set. Similarly, the move 
to an exclusive cache micro-architecture, instead of an 
inclusive one, will make it impossible for our code to 
trivially evict entries from the LI cache, making mea¬ 
surement much more difficult. These two design de¬ 
cisions, however, were chosen deliberately to make the 
CPU more efficient in its design and in its use of cache 
memory, and changing them will exact a performance 
cost on many other applications. In addition, modify¬ 
ing a CPU’s micro-architecture is far from trivial, and 
definitely impossible as an upgrade to already deployed 
hardware. 

On the Javascript level, it seems that somewhat re¬ 
ducing the resolution of the high-resolution timer will 
make this attack more difficult to launch. However, the 
high-resolution timer was created to address a real need 
of Javascript developers for applications ranging from 
music and games to augmented reality and telemedicine. 
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Browser brand 

High Resolution Time Support 

Typed Arrays Support 

Worldwide prevalence 

Internet Explorer 

10 

11 

11.77% 

Safari 

8 

6 

1.86% 

Chrome 

2C^ 

7 

50.53% 

Eirefox 

15 

4 

17.67% 

Opera 

15 

12.1 

1.2% 

Total 

- 

- 

83.03% 


Table 1; Prevalence of vulnerable desktop browsers, according to 11 


A possible stopgap measure would be to restrict access 
to this timer to applications which gain the user’s con¬ 
sent (for example, by displaying a conhrmation window) 
or the approval of some third party (for example, by be¬ 
ing downloaded from a trusted “app store”). 

An interesting approach could be the use of heuristic 
profiling to detect and prevent this specific kind of attack. 
Just like the abundance of arithmetic and bitwise instruc¬ 
tions was used by Wang et al. to indicate the existence 
of cryptographic primitives m, it can be noted that the 
various measurement steps of our attack access memory 
in a very particular pattern. Since modern Javascript run¬ 
times already scrutinize the runtime performance of code 
as part of their profile-guided optimization mechanisms, 
it should be possible for the Javascript runtime to de¬ 
tect profiling-like behavior from executing code and then 
modify its response accordingly (for example by jitter¬ 
ing the high-resolution timer, dynamically moving arrays 
around in memory, etc). 

5.3 Conclusion 

In this report, we showed how the micro-architectural 
side-channel attack, which is already recognized as 
an extremely potent attack method, can be effectively 
launched from an untrusted web page. Instead of the 
traditional cryptanalytic application of the cache attack, 
we instead showed how user behavior can be effectively 
tracked using this method. The potential reach of side- 
channel attacks has been extended, meaning that addi¬ 
tional classes of secure systems must be designed with 
side-channel countermeasures in mind. 
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